Efficient Support for Pipelining in Distributed Shared Memory Systems∗
نویسندگان
چکیده
Though more difficult to program, distributed-memory parallel machines provide greater scalability than their shared-memory counterparts. Distributed Shared Memory (DSM) systems provide the abstraction of shared memory on a distributed machine. While DSMs provide an attractive programming model, they currently can not efficiently support all classes of scientific applications. One such class are those with recurrences that cause dependencies across processors or nodes. A popular solution to such problems is to use pipelining, which breaks the computation into blocks; each processor performs the computation of a block, which enables the next processor in the pipeline to compute its corresponding block. Once the pipeline is filled, the computation of blocks proceeds in parallel. While pipelining is useful, it is not efficiently supported by current DSM systems. This paper presents an approach to integrating pipelining into DSM systems. We describe our design and implementation of one-way pipelining in a DSM. The key idea is to retain the shared-memory model, but design the extensions such that the execution will mimic what would be done in an explicit message-passing program. We show that one-way pipelining is superior to the two most common ways to program pipelined applications, which are distributed locks and explicit matrix transposition. Finally, we show that one-way pipelining is competitive with a hand-coded, explicit message-passing program.
منابع مشابه
Data Dependence Boundary Row Boundary Row Node
Though more diicult to program, distributed-memory parallel machines provide greater scalability than their shared-memory counterparts. Distributed Shared Memory (DSM) systems provide the abstraction of shared memory on a distributed machine. While DSMs provide an attractive programming model, they currently can not eeciently support all classes of scientiic applications. One such class are tho...
متن کاملRadish: Compiling Efficient Query Plans for Distributed Shared Memory
We present Radish, a query compiler that generates distributed programs. Recent efforts have shown that compiling queries to machine code for a single-core can remove iterator and control overhead for significant performance gains. So far, systems that generate distributed programs only compile plans for single processors and stitch them together with messaging. In this paper, we describe an ap...
متن کاملSystem Software Support for Reducing Memory Latency on Distributed Shared Memory Multiprocessors
This paper overviews results from our recent work on building customized system software support for Distributed Shared Memory Multiprocessors. The mechanisms and policies outlined in this paper are connected with a single conceptual thread: they all attempt to reduce the memory latency of parallel programs by optimizing critical system services, while hiding the complex architectural details o...
متن کاملEffiziente Implementierung eingebetteter Runge-Kutta-Verfahren durch Ausnutzung der Speicherzugriffslokalität
Embedded Runge-Kutta methods are among themost popular numerical solutionmethods for non-stiff initial value problems of ordinary differential equations. While possessing a simple computational structure, they provide desirable numerical properties and can adapt the step size efficiently. Therefore, embedded Runge-Kutta methods can often compute the solution function faster than other solution ...
متن کاملDetermining Asynchronous Pipeline Execution
Asynchronous pipelining is a form of parallelism in which processors execute diierent loop tasks (loop statements) as opposed to diierent loop iterations. An asynchronous pipeline schedule for a loop is an assignment of loop tasks to processors, plus an order on instances of tasks assigned to the same processor. This variant of pipelining is particularly relevant in distributed memory systems (...
متن کامل